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Abstract. 

This paper presents a detailed error analysis ol geometric hashing in the domain ol 2D object 
recogition. Earlier analysis has shown that these methods are likely to produce false positive 
hypotheses when one allows for uniform bounded sensor error and moderate amounts of extraneous 
clutter points. These false positives must be removed by a subsequent verification step. Later work 
has incorporated an explicit 2D Gaussian instead of a bounded error model to improve performance 
of the hashing method. 

The contribution of this paper is to analytically derive the probability of false positives and 
negatives as a function of the number of model features, image features, and occlusion, under the 
assumption of 2D Gaussian noise and a particular method of evidence accumulation. A distinguish- 
ing feature of this work is that we make no assumptions about prior distributions on the model 
space, nor do we assume even the presence of the model. The results are presented in the form of 
ROC (receiver-operating characteristic) curves, from which several results can be extracted; firstly, 
they demonstrate that the 2D Gaussian error model performs better for high clutter levels and 
degrades more gracefully as compared to the uniform bounded error model for the same conditions. 
They also directly indicate the optimal performance that can be achieved for a given clutter and 
occlusion rate, and how to choose the thresholds to achieve the desired rates. 

Lastly, we verify these ROC curves in the domain of simulated images. 
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1 Introduction 



Geometric hashing is a technique introduced in [LSW87], 
[HW88], to solve the problem of recognizing objects and 
their associated poses in cluttered scenes. The main idea 
behind the technique is that instead of checking every 
possible correspondence of image to model features to 
establish a model pose and then checking the image for 
supporting evidence, the recognition process is consid- 
erably sped up by splitting it into two stages. In the 
first stage, a database of all possible views of the model 
are precomputed and stored in a hash table. Recogni- 
tion consists of using 2D image features to index into 
the hash table in order to vote for possible model poses. 

However, under the assumption of uniform bounded 
sensor error, performance degrades rapidly with even a 
moderate amount of clutter [GHJ91]. Intuitively, the 
reason is that the error causes the point entries in the 
hash table to blur into regions, making the table denser 
and increasing the chances that a random image point 
(i.e., a point not arising from the model) will corroborate 
an incorrect hypothesis. 

In this paper we analyze the effect of a more realis- 
tic noise model on these techniques. The question we 
address in the paper is, what kind of performance can 
we expect from the techniques as a function of the num- 
ber of model features and clutter features (i.e., signal to 
noise ratio)? 

To answer the question, first we briefly present the 
original hashing algorithms, then we show how to mod- 
ify them in the presence of sensor error. We model the 
error as a 2D Gaussian distributed vector, which is often 
a more realistic model than the uniform bounded error 
model used in the earlier analysis [GHJ91]. A voting 
function for accumulating evidence for hypotheses based 
on the error model is presented. (Similar approaches 
to extending geometric hashing have been explored in 
[CHS90], [RH91].) This is the background for main ques- 
tion, which is, how does one determine a reliable point 
at which to separate correct from incorrect hypotheses? 
This question is relevant in the noiseless case as well: as- 
sume there is a 25% occlusion rate, and we are searching 
for a model of size 20. Do we decide that a hypothesis is 
true after seeing 15 corroborating features, or 12, or 10? 
Clearly, the lower the acceptance threshold, the higher 
the probability of false positives, and the higher the ac- 
ceptance threshold, the higher the probability that we 
will miss a correct hypothesis, i.e. of false negatives. 

To find the optimal acceptance threshold for a fixed 
occlusion rate and a fixed number of model and clutter 
points, we use the given error model and voting scheme 
to derive expressions for the probability density func- 
tions of weights of positive and negative hypotheses. We 
then vary the acceptance threshold and find the proba- 
bility of false positives and true positives for that thresh- 
old. The results are plotted as ROC curves, which indi- 
cate the optimal performance that can be achieved for 
the given level of occlusion, clutter, and number of model 
points. 



2 Statement of the Geometric Hashing 
Algorithm 

We begin by reviewing the original geometric hash- 
ing algorithm assuming exact measurements [LSW87], 
[HW88]. The algorithm consists of two stages, a model 
preprocessing stage and a recognition stage. For simplic- 
ity, we restrict attention to planar objects in arbitrary 
3D pose. The model representation consists of a set of 
(x,y) points in what we will call model space, which is 
simply some fixed coordinate system. The points can be 
corners, points of high curvature, or points of inflection 
of the 2D model. 

Assuming orthographic projection, we can repre- 
sent the image location [ui,V{, 1] T of each model point 
[x{, yi, 1] T with a simple linear transformation: 
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where the upper left of the transformation matrix is a 
2x2 non-singular matrix, and [t x ,t y ] T is the translation 
vector. This is because the projection onto the z = 
plane of a rotated, scaled, and translated point (x, y, 0, 1) 
simplifies to 
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where S is a positive scale factor. It is a well known 
fact that if a point has coordinates X with respect to 
a given basis, then a linear transformation on the entire 
space leaves the coordinates of the point unchanged with 
respect to the transformed coordinates of the basis. The 
coordinates of X with respect to the basis are called 
affine coordinates, and it is their invariance under linear 
operations which is utilized in geometric hashing. 

In the preprocessing stage, the hash table is con- 
structed as follows: Every ordered triple of model points 
is used as a basis, and the affine coordinates (a, /?) of all 
other model points are computed with respect to each 
basis. Thus, if mo, fh\ and 777.2 are basis points, then we 
represent any other feature point by 

fhi = m + a- (mi - m ) + j3i(m 2 - rh ). 

The basis (i.e., the 3 model points) is entered into 
the hash table at each (a;,/?;) location. Intuitively, the 
invariance of the affine coordinates of model points with 
respect to 3 of its own points as basis is being used to 
"precompute" all possible views of the model in an im- 
age. The actual algorithm is: 

• for every ordered model triplet B^ = (mo, mi, 7712), 
— for every other model point rrij 

(i) find coordinates rrij = (aj,/3j) with respect 
to basis Bk 



(ii) enter basis B^ at location (aj,/3j) in the 
hash table. 

The running time for this stage is 0(m 4 ), where 
m=mimber of model points. 

At recognition time, the image is processed to ex- 
tract 2D feature points which are used to index into the 
table. The choice of features used must be determined 
by what points were used as model feature points, i.e., 
if corners were used as model features, then one might 
take the intersection of all line segments to be the im- 
age feature points. Every image triple is then taken to 
be a basis, and the affine coordinates of all other image 
points is computed with respect to the basis to index 
into the hash table and "vote" for all bases found there. 
Intuitively we are searching for any three image points 
which come from the model, and using the hash table to 
verify hypothesized triples of image points as instances 
of model points. Such an image triple will yield a large 
number of votes for its corresponding model basis. In 
particular: 

• for every unordered image triplet (io, i\, 12) 

(a) for every other image point ij 

(i) find coordinates ij = [otj , /3j ) with respect 

to basis (io, i\, 12) 
(ii) Index into the hash table at location [otj , /3j) 
and increment a histogram count for all 
bases found there. 

(b) If the weight of the vote for any basis B^ is suf- 
ficiently high, stop and output the correspon- 
dence between triple (io, i\, 12) and basis B^ as 
a correct hypothesis. 

In some versions of the algorithm, the hypothesis that 
is output subsequently undergoes a verification stage be- 
fore being accepted as correct. Note that we need to 
order the points either at the preprocessing stage or at 
recognition time, but not both (or there would be a six- 
fold redundancy of correspondences). We choose to or- 
der the points at the preprocessing stage and enter every 
model point with respect to a single unordered basis set 
6 times, once for every ordering of the basis set. This 
makes the table 6 times denser, but then at recognition 
time we need only to choose an unordered image triple 
and impose a single arbitrary ordering upon it. This 
way, when we use the remaining image points to index 
into the hash table, we vote for the ordering of the model 
basis set as well as model basis set itself. The termina- 
tion condition for accepting a correspondence of bases 
(and hence a pose of the object) and the confidence of 
the result are exactly the issues we investigate in this 
paper. 

3 Modifications to the Algorithms in 
the Presence of Error 

We now assume sensor uncertainty, namely, that a model 
feature appears at its projected location, but displaced 
by an error vector drawn from some distribution. With- 
out noise, a correct matching {i.e., a correct pairing of 3 
model basis points and 3 image basis points) yields a sin- 
gle [x, y) location for a projected fourth model point in 



the image and a single (a, /?) location for the same point 
in the hash table. Under the assumption of circular uni- 
form bounded error, [GHJ91] showed that a matching 
gives rise to a circular disk of possible image locations 
for any projected fourth model point, and that this cir- 
cular disk in the image translates to an ellipsoidal range 
of affine coordinates in the hash table. Therefore, in 
practice, the bases should be stored (weighted by some 
function of the error distribution) at all possible affine 
locations for the fourth point. However, it is simpler 
to analyze the probability that a uniformly distributed 
random point will fall into a given circle, than to trans- 
late the uniform distribution into a distribution on affine 
coordinates, and to analyze the probability that the ran- 
dom point with affine coordinates drawn from this dis- 
tribution will fall into a given ellipse. It is clear that 
the answer is the same, but that the first space is more 
manageable than the second. We will therefore choose 
to do the analysis using the simpler space, keeping in 
mind that the results found in this fashion are true of 
the analysis done in hash table space as well. One con- 
sequence of this is that the analysis will apply equally 
well to alignment and to geometric hashing. 

In the modified algorithm, instead of incrementing a 
histogram count for every eligible basis by a full vote, 
we increment the basis count by a number between 
and 1 according to some "goodness" criterion, which in 
our case is a function of the distance of the point from 
its expected location. Because of this, we must look 
at the density function of the accumulated values for 
correct and incorrect hypotheses, instead of the discrete 
probability of a particular vote. We will use the term 
"weight of a hypothesis" to denote this concept. 

4 Overview of the Analysis 

The main claim of the paper is supported by the argu- 
ment whose steps are as follows: 

(a) A 2D circular Gaussian distribution often a more 
accurate model for sensor error, as opposed to a model 
assuming bounded uniform distribution [Wel91]. While 
a bounded model leads to conservative estimates on per- 
formance, a Gaussian model may lead to more practical 
estimates. 

(b) Using this Gaussian distribution, the following is 
true: given a correspondence between 3 image points and 
3 model points (referred to as a hypothesis for the rest of 
the article), and assuming a fixed standard deviation <7o 
for the sensed error of the image points, the location of 
a fourth model point with affine coordinates (a, /?) (with 
respect to the 3 image basis points) will also have a 2D 
circular normal distribution with standard deviation <r e : 
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Note that the possible distance of a fourth model 
point from its predicted location is now unbounded. In 
our scheme we will pick a cutoff search distance of 2<r e 
for possible matching image features, which will imply a 
probability of false negative identification of 13.5% for a 
single point. 

(c) As in [GHJ91], we find the density of <r e , in one 
case when the values of <r e come from a model appear- 



ing in the image (/ij(c e )), and in the other case, on <r e 
resulting from incorrect hypotheses (f-pj-(a e )). The two 
different density functions are 



points, the distribution is: 
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where &i = 0.58, b = 0.35. 

(d) Next, we modify the recognition algorithm so that 
it assigns weights to points found within the error disk, 
as opposed to a single 1/0 vote. We choose to use: 
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where d =distance from the point's hypothesized to ac- 
tual location. This is the value of the 2D Gaussian den- 
sity function whose center is at the hypothesized loca- 
tion. 

(e) Define random variables Vh = the weight that 
a model point's projection contributes to its supporting 
basis, and Vpj- = the weight that a random image point 
contributes to a given basis. To demonstrate what this 
means, in the simpler bounded uniform error case, the 
distribution of Vh is: 



f(V H = v) 



i.e., the probability that a fourth model point will 
contribute a weight of 1 to a correct hypothesis is 1 — c, 
where c is the probability of occlusion. A more compli- 
cated expression holds for Vpj- [GHJ91]. 

In the Gaussian error scheme with a cutoff distance 
of 2c» these distributions are: 
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and si , s 2 are the minimum and maximum allowable val- 
ues for a e , respectively. 

(f) The probability density function for the weight 
of an incorrect hypothesis is calculated as follows: For a 
single random point in an image with m projected model 
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Dropping n points convolves this distribution with itself 
n — 3 times: 

n — 3 

8 = 1 

For a model of size m and a correct hypothesis in 
an image with n points, the weight of the total vote 
for this hypothesis is the sum of weights over all m — 3 
other projected model points plus the sum of the weights 
of the n — m clutter points. We will call this random 
variable Wn m _„ = YlT=i ^H t + Yl?=™ ^IT ■■ Though the 
random variables Vh, are not independent, we make the 
simplifying assumption that they are, and proceed with 
the analysis. Assuming independence, the sum follows 
the distribution: 



f(W Hn , m = v) 
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The validity of this assumption will be examined in a 
later section of this paper. We will use the central limit 
theorem to avoid actually having to compute this distri- 
bution, and will assume that the result of the convolution 
is Gaussian. 

(g) Given these two distributions, we can now find the 
probability that an incorrect hypothesis will look like a 
correct one. The problem of deciding whether a sen- 
sor basis corresponds to a particular model basis is a 
simple binary hypothesis testing problem, for which we 
can easily find an optimum decision rule. We postpone 
the discussion of this rule until a later section; for now 
we will simply state that the decision rule yields a fixed 
probability of false positive (Pf) versus detection (Pd) 
as a function of threshold. It is also shown that this 
decision rule performs better for high clutter levels and 
degrades more gracefully as compared to the analogous 
optimal decision rule in the uniform bounded error case. 

(h) Now let us step back and look at the overall de- 
cision problem. We pick three image points, and accu- 
mulate weights for (™) *6 bases. Suppose we are willing 
to verify (by alignment or any other verification tech- 
nique) all bases that pass the initial test, as long as there 
are < k of them. Then, an overall false positive is the 
combined event that the three image points being tested 
do not arise from the model, yet more than k model 
bases "look good" . An overall true positive is the com- 
bined event that the three image points do arise from 
the model, that < k model bases pass the test, and of 
these, one of them is the correct one. We will call these 
combined events Up and Q.d, and 



Multiplying by a scalar yields: 



P(Q F ) = 1-J2p f (1-PfP> 



f(ca = (x,y)) 



f(a x 



8 = 
k-1 



p(q d ) = p D *Y,pki-PF)W- i 

8 = 

The following sections show the derivation of these 
distributions, and the results of the analysis both ana- 
lytically and empirically. 

5 Deriving the Projected Gaussian 
Distribution 

In [GHJ91] an analytic expression for the case of circular 
error disks was derived as follows: given 3 model points 
(with model space coordinates) as basis, and the affine 
coordinates of a fourth model point with respect to this 
basis, the expression for the coordinates of the fourth 
point in model space is 

m 4 = rf?i + a(rn*2 — mi) + /3(m3 — m\). 

Under an arbitrary affine transformation T, each model 
point projects to the location 

si = Trfti + el 

where e~l is a vector drawn from the error distribution. 
The possible location of the fourth model point is found 
by plugging the first expression into the second equation, 
to yield 

si = Tm*4 + e~4 

where 

e~4 = (1 - a - (3)e~[ + ae" 2 + (3e~ 3 + e" . (1) 

When the error vector is drawn from a uniform circular 
distribution with radius e, the expression for the pro- 
jected error vector is found to be 

e[|l-a-/?| + |a| + |/?|+l] (2) 

For this paper, the sensor error vector is drawn from a 
two dimensional circular Gaussian distribution. The 2D 
Gaussian probability density of a random variable a with 
covariance is denoted as: 
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= f(a x = x)f(a y = y) 

Because the two components are independent, the prob- 
ability density of the sum of two random variables with 
2D Gaussian distribution and covariance is: 

f(a + b = (x, y)) = f(a x + b x = x,a y + b y = y) 

Convolution in each dimension yields: 

f(a + b = (x,y)) 
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Therefore, assuming e~l to be 2D Gaussian with co- 
variance and standard deviations <Ji x = <jj y = a, the 
distribution of the vector in equation (1) is a 2D Gaus- 
sian with covariance and standard deviation: 
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in both the x and y direction. Because the Gaussian 
distribution is not bounded, we choose to terminate the 
search for points after a radius of 2<r e , which means that 
we will find an image feature arising from a model point 
86.5% of the time (this is demonstrated in a later sec- 
tion). Note that this expression is always smaller than 
its analogous expression for disk radius in the uniform 
bounded error model from equation (2) because of the 
triangle inequality. In the comparison, e = 2a. 

6 Determining the Distribution for a e 

In the analysis we use two different probability densities 
for a e , one for correct basis matchings and one for incor- 
rect basis matchings. Intuitively, this is due to the fact 
that when an incorrect basis matching is tested, more of- 
ten than not the projected model points fall outside the 
image range and are thrown away, while when a correct 
hypothesis is tested the remaining model points always 
project to within the image. In tests we have observed 
that over half of the incorrect hypotheses are rejected for 
this reason, leading to an altered density for a e . 

Let us call the two distributions /ff(<T e ) and f-^-(a e ). 
We empirically estimate the former distribution by gen- 
erating a random model of size 25, then for each ordered 
triple of model points as basis, we increment a histogram 
for the value of a e as a function of a and /? for all the 
other model points with respect to that basis. For the 
latter distribution, we generate a random model of size 
4 and a random image, and histogram the values of a e 
for only those cases in which the initial basis matching 
causes the remaining model point to fall within the im- 
age. The distributions for a e found in this manner have 
been observed to be invariant over many different values 
of model and image points. 

The model is constrained such that the maximum dis- 
tance between any two model points is not greater than 
10 times the minimum distance, and in the basis selec- 
tion, no basis is chosen such that the angle ip between 
the two axes is <| ip |< -f^ or j^ir < ip < t?tt. This is 
done to avoid unstable bases. 

The results were almost identical in every test we ran; 
two typical normalized histogram are shown in figure 1. 
For a choice of a = 2.5, the histograms very closely fit 
the curves /ij(cr e ) = (&i<r e ) -2 , b\ = 0.58, and fjj = 
(&2Ce) -4 , b'j = 0.35 between the ranges si = 2.875 and 
S2 = 120. Figure 1 shows the estimated density functions 



shown superimposed on the empirical distributions. The 
integral of the analytic expression thus defined = 1.009 
and 0.975, respectively. 

7 Derivation of the Single Point 
Distributions 

In this section we show the derivation of the distributions 
f(Vn = v), the density function on the values that an 
image point contributes to a model basis given that the 
point comes from the model, and f(V-jj = v), the density 
function on the values that an image point contributes 
to a basis given that it is a random point. We begin with 
the former. 

7.1 Deriving f(V H ) 

Given a correct hypothesis and no occlusion, the location 
of a projected model point can be modeled as a vector 
d centered at the predicted location with Gaussian dis- 
tribution (expressed in polar coordinates) 



/(d=M)) 



1 



27T<7 2 



where we know a e and its distribution. We now choose 
an evaluation function 17(d), which we use to weight 
a match that is offset by d from the predicted match 
location. We want to find its density, i.e., we want 
f(v = (7(d)), where the distribution of d is as stated. 
As mentioned, we choose the evaluation function 
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Since the evaluation function g is a really function of r 
alone, we need to know the density function of r. To find 
this, we integrate f(r, 9) over 6: 
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Next, we want to find the density of the weight func- 
tion v = g(r). The change of variables formula for a 
monotonically decreasing function is: 



dens(<7(r) = v) 



-/(<r») 



Working through the steps, we find 
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It may seem counterintuitive that the resulting dis- 
tribution is constant. However, this can be understood 
if one considers an example in which f(r, 9) is uniformly 
distributed. Integrating over all angles yields a linearly 
increasing function in r. Assigning an evaluation func- 
tion 17(d) which is inversely proportional to r yields a 
constant density function on f(v). The same thing is 
happening here, only quadratically. Since we only search 
for a match out to a radius of 2a e , the effective distribu- 
tion is: 
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i.e., we will miss a good point e -2 = 13.5% of the time. 
This expression correctly integrates to 1. Now, note that 
in the expression we have a fixed a e , i.e., we actually 
have derived f(v = g(r) \ <r e ). We need to integrate this 
expression over all values of <r e , that is: 



f(V H = v) 
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There are two things to take into consideration when 
calculating the limits for this expression: first, the possi- 
ble values of a e range from a lower limit si to an upper 
limit S2 , due to limits on the values of the affine coordi- 
nates. (Earlier, we saw for a = 2.5, that s\ = 2.875, S2 = 
120). Also, for a given a e , it is clear that the maxi- 
mum value we can achieve is when r = => v = „ 1 , , 

and the minimum value we can achieve is at the cutoff 
point r = 2<7 e => v = „ 1 2 e~ 2 . Setting v to each of 

these expressions and solving for a e leads to the con- 
clusion that for a particular value v, the only values for 
a e such that g(d | a e ) could equal v are in the range 

( 
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Figure 1: The distributions /ff(<T e ) and f-^-(a e ). 
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7.1.1 Adding Occlusion 

It is easy to add occlusion into this distribution by 
considering an independent process whose probability of 
occluding any given point is c. Therefore, the above 
distribution is multiplied by another factor: 



fc(V H = v) 



f f(V H = 0)(l - c) + c v = _ 
\ /(Vh = v)c otherwise 



We will use the distribution /, not f c , in the rest of 
the paper, and will reconsider the rate of occlusion only 
in the context of calculating false negatives in a later 
section. 

7.2 Deriving f(V w ) 

We do the same derivation for the distribution f(Vjf). 
Given a hypothesis and a random point, we calculate 
the distribution as follows: let event A = "point falls in 
hypothesized error disk" . This is the area of the error 
disk over the size of the image R 2 , i.e., 



P(A | a e ) 
P(A | a e ) 



R 2 
R 2 - 4ira 2 e 
R 2 



Now we calculate the probability that a point which 
is uniformly distributed inside a disk of radius 2a e con- 
tributes value v for an incorrect hypothesis, using the 
evaluation function defined in the previous section. As 
before, we must express a uniform distribution in polar 
coordinates and then integrate over 6 to get the distri- 
bution in terms of r alone, since the evaluation function 
g is a function of r: 



fir, 0) 

fir) 



1 



7T(2,7 e ) 2 

2tt 



/0 

r 
2al 



7T(2,7 e ) 2 



As before, we calculate the density of (y 
A, <r e ) with the new distribution for r and get: 



air) 



g'ig- 1 ^)) 
= I,- 

2 
Therefore, the density function of v for a fixed <r e is: 

f P(A | <7a) 



fiV W I <Te) 



= 

f(v\A,a e )P(A\a e ) 

_ 2na^ 1 



< V < 



~ R 2 v 2na 2 e 2 

otherwise 

Again, this expression correctly integrates to 1. As be- 
fore, we need to integrate over all values of a e : 



fiV-w = v) 



fiVn 



s = cr )/if( s = a ) da 



i^)iha)-Ua 



2tt 



b A R 2 v 



2 da 



Dealing with v = as a separate case, and with the 
same bounds as before, integrating yields: 



fiv-w) 




otherwise 



where 
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We ran an experiment to test the analysis of this sec- 
tion, and the results are shown in Figure 2. Both graphs 
show a normalized histogram of the results of 15, 000 in- 
dependent trials. The first graph indicates the empirical 
results corroborating the predictions very closely. While 
the comparison of the second graph is less visually strik- 
ing, note that the deviation at any point between the 
empirical and predicted results is generally less than one 
count. 
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Figure 2: Distributions, f(y), with and without model 



8 Finding the Weight Density of a 
Model in an Image 

Having found the single point densities, we use them to 
find the density of the combined weight of points for cor- 
rect and incorrect hypotheses. We start with the density 
function on weights of correct hypotheses. For a model 
of size m and an image of size n, a correct hypothesis 
should have weight density 

m — 3 n—m 

f(W Hm = v) = (g) f(V Hi ) ® (g) f(V Wi ) 

8=1 8=1 

assuming that each point contributes weight to its sup- 
porting basis independently of any other. In order to 
avoid convolving the distributions from the previous sec- 
tion, we find the expected value and the standard devi- 
ation of the distributions and invoke the central limit 
theorem to claim that the combined weight of a correct 
hypothesis of a size m model in a size n image with 
should roughly follow the distribution: 

N(mEji + (n — m)Ejj, ma H + (n — m)a—) 

in which 

E H (v) = / vf c (v)dv + / vf c (v)dv 
Jo Jli 

[■Is j-ii 
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E H (v 2 ) 



v 2 f c (v)dv 



v 2 f c (v)dv 



v 2 f c (v)dv 



v 2 f c (v)dv 



(1-c): 
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>H 



'607T 2 &2 e 6 

1.6845 x 10 -3 x 
E H (v) 2 - E H (v 2 ) 



1 *2 

(1-C) 
h\ 



For an incorrect hypothesis we look at the problem in two 
steps. First we derive, as above, the mean and standard 
deviation of the process in which n = m = 4, i. e., a single 
random image point drops into a single error circle. From 
the distribution of f(V-j^), we find: 



E W (v) 



vf(V w )dv+ / vf(V w )dv 



+ / vf(V w )dv + / vf(V w )dv 
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E W (v 2 ) 



v z f(V w )dv + / v z f(V w )dv 



+ / v A f(V w )dv + / v A f(V w )dv 
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a 2 -(v) = E w (v) 2 - E w (v 2 ) 

Plugging in the values si = 2.875, S2 = 120, &o = 
0.35, &i = 0.58, c = 0, and R = 500 for the experimental 
data of section 6 yields 



E H 


= 3.26 x 10 -3 


<?h 


= 1.49 x 10 -5 


Eh 


= 3.19 x 10 -6 


^ 


= 2.08 x 10 -8 



H 

Note that the value of the limit s 2 was determined 
empirically and is a function of the constraints on the 
bases that are chosen. Without the basis constraints, 
S2 tends to infinity, and in fact the values of these pa- 
rameters for S2 = 120 and s 2 = co are not significantly 
different . 



Now, consider a single random image point (i.e., 
n = 4; three for the hypothesis and one left over) 
dropped into an image where a model of size m > 4 
is hypothesized to be. In this case the event that the 
random point will contribute weight v to this hypothesis 
is calculated as follows: Let event A{ = "point drops in 
the ith circle." Then, 



f(V 7Tm = v\v^0) 

= f( v ,A x ) + f(v,A 2 ) + 
= (m-3)f(v,A x ) 



+ f(v,A m _ 3 ) 



Note that because we are assuming the circles are dis- 
joint, we are overestimating the probability of the point 
falling in any circle. The actual rate of detection will 
be lower than our assumption, especially as the m grows 
large. 



/(%„, = ») 



1 _ (m-3)4?r r 1 
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(m-3)27T J-/ 
R 2 b*v IA e 
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otherwise 



As m grows large, (1 — (m — 3) J^ hi [s 2 — s x ]) < so 

this expression is no longer a density function. This is 
the point at which the model covers so much of the im- 
age that a random point will always contribute to some 
incorrect hypothesis. Therefore, this analysis only ap- 



R 2 bt 



3. For 



plies to models for which which m < r _ s -, 
R = 500, m < 60, and for R = 256, m < 18. 

The mean and standard deviation for one random 
point dropping into m — 3 random circles is: 
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vf(Vg )dv + / vf(Vg )dv 
vf(V w )dv+ / vf(V w )dv 

'-2 

v[(m - 3)f(V w )]dv 
v[(m - 3)f(V w )]dv 
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v[(m-3)f(V s J]dv 
( m _ i)ElT (v) 

v^iVjj )dv + / v^iVjj )dv 
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r is j-ii 

v^iVjj )dv+ / v^iVjj )dv 

Ji 3 

0+ / 2 v 2 [(m-3)f(V w )]dv 
v 2 [(m-3)f(V w )]dv 



+ / V 2 [(m _ 3)/(F _ )]dv 

J£ 3 

= (m-^E^v 2 ) 

4 m w = %>) 2 -%> 2 ) 

= (m- ZfExivf - (m - S)E^v 2 ) 

Dropping n points convolves this distribution with itself 
n — 3 times: 

n — 3 
f( W H m , n = V )=<g>f( V Hj 

8 = 1 

And therefore the weight that an n-size random im- 
age contributes to an incorrectly hypothesized model of 
size m follows the distribution: 

# (("- 3 )%„,>("- 3 )4j 

Note that this is the weight density of a single incorrect 
hypothesis. 

The means for both distributions were tested empir- 
ically from the same experiment as shown in Figures 2. 
A table of values is given in figure 3. 

9 Interpreting the Results 

We have derived expressions for the weight densities of a 
hypothesis given that it is incorrect, and given that it is 
correct. We are interested in using these distributions to 
determine the effectiveness of geometric hashing under 
different clutter conditions. To do this, we briefly intro- 
duce the ROC (receiver operating characteristic) curve, a 
concept borrowed from standard hypothesis testing the- 
ory, and cast our problem in terms of this framework. 

9.1 ROC: Introduction 

The problem is to decide which one of two hypotheses, 
Ho and Hi, is correct. There is a random variable whose 
distribution is known given one or the other hypothesis, 
i.e., we know f(X \ Ho) and f(X \ Hi). Let the space of 
all possible values of the random variable X be divided 
into two regions, Zo and Z\, such that we decide Ho if 
the value of X falls in Zo and H.\ if X falls in Z\ . Then 
we can define the quantities 



P, 



Pr(say H \ H is true) = / p(X \ H )dX 

Jz 

P F = J Pr(say H x \ H is true) = / p(X | H )dX 

JZx 

' M = Pr(say H \ Hi is true) = / p(X \ Hi)dX 

Jz 

P D = J Pr(say H x \ Hi is true) = / p(X | H x )dX 

JZx 

These quantities are often referred to as Pm= "Prob- 
ability of a miss", Pd= "Probability of detection", and 
Pf= "Probability of false alarm" for historical reasons. 

One way of constructing a decision rule is to use 
the likelihood ratio test (LRT) to divide the observation 
space into decision regions, i.e., 





Mean 


Variance 


With M 


Empirical 


Predicted 


Emp/Pred 


Empirical 


Predicted 


Emp/Pred 


m-3=l, n-3=l 

m-3=l,n-3=100 

m-3=l,n-3=500 

m-3=5, n-3=5 

m-3=10,n-3=10 

m-3=10,n-3=100 

m-3=10,n-3=500 


3.6953E-3 

3.8383E-3 
4.8026E-3 
1.9658E-2 
4.1986E-2 
4.4513E-2 
5.5476E-2 


3.2177E-3 
3.5339E-3 
4.8115E-3 

1.6089E-2 
3.2177E-2 
3.5052E-2 

4.7828E-2 


1.148 
1.086 
.9981 
1.222 
1.305 
1.270 
1.160 


1.5186E-5 
1.7350E-5 
2.2274E-5 
1.4927E-4 
5.4130E-4 
5.3400E-4 
5.7484E-4 


1.4625E-5 
1.6680E-5 
2.4984E-5 
7.3124E-5 
1.4625E-4 
1.6485E-4 
2.4752E-4 


1.038 
1.040 
.8915 
2.041 
3.701 
3.239 
2.322 




Mean 


Variance 


Without M 


Empirical 


Predicted 


Emp/Pred 


Emp 


Predicted 


Emp/Pred 


m-3=l, n-3=l 

m-3=l,n-3=100 

m-3=l,n-3=500 

m-3=5, n-3=5 

m-3=10,n-3=10 

m-3=10,n-3=100 

m-3=10,n-3=500 


3.2410E-6 
3.0681E-4 
1.6344E-3 
8.9131E-5 
3.4949E-4 
3.5082E-3 
1.6289E-2 


3.1940E-6 
3.1940E-4 
1.5970E-3 
7.9850E-5 
3.1940E-4 
3.1940E-3 
1.5970E-2 


1.015 
.9606 
1.023 
1.116 
1.094 
1.098 
1.020 


1.8747E-8 
1.9738E-6 
1.1163E-5 
6.4808E-7 
2.4001E-6 
2.3277E-5 
1.0766E-4 


2.0760E-8 
2.0760E-6 
1.0380E-5 
5.1797E-7 
2.0668E-6 
2.0668E-5 
1.0334E-4 


.8897 
.9508 
1.075 
1.251 
1.161 
1.126 
1.042 



Figure 3: A table of predicted versus empirical means and variances of the distribution f(Wn m „ = v), in the top 
table, and f(Wjj- = v) in the bottom table, for different values of m and n. 



-Hi 



P(X | ffi) > 
p(X\H ) < V 

That is, if the ratio of the conditional densities is greater 
than a fixed threshold r\, choose Hi, otherwise choose 
Ho- Note that changing the value of r\ changes the de- 
cision regions and thus the values of Pp&nd Pd- The 
ROC curve is simply the graph of Pp versus Ppas a func- 
tion of threshold for the LRT. As it turns out, both the 
Neyman-Pearson test and the optimal Bayes test involve 
this LRT, thus the ROC curve encapsulates all infor- 
mation needed for either test, since any (Pf,Pd) point 
yielded by either test necessarily lies on the ROC curve. 
If the prior probabilities of Ho and Hi are known, then 
the optimal Bayes decision rule picks the ROC point 
which minimizes the expected cost of the decision by us- 
ing the LRT in which the threshold is a function of the 
costs and priors involved: 



T) 



(Cio — Coo)Pq 



(Coi — Cn)Pi 

where C'ij is the cost associated with choosing hypoth- 
esis i given that hypothesis j is correct. In the absence 
of such priors, a Neymann Pearson test is often consid- 
ered optimal, in which one simply picks a point on the 
ROC curve which gives satisfactory performance. Note 
that this is not the same as minimizing the decision's 
expected cost. 

For example, assume for our problem that Ho ~ 
N(mo,<j'o) and Hi ~ A(mi,<7 2 ), and assume that mi > 
mo and a\ > (Tq. The likelihood ratio test yields: 

„ Hi 



X — mo 
Co 



X — mi 



> 
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H 



2 In 



Co 



7 



The regions Zq and Z\ are found by solving the above 
equation for equality, 



Xi 



X, 



[(micrg - m <7 2 ) - tJoai(y[aj - <7q] + (m - mi)) 1 / 2 ] 
[{mial - m <7 2 ) - tJ ai(j[af - cr 2 ,] + (m - mi)) 1 / 2 ] 



The values of .P^ and P^are found by integrating the 
conditional probability densities p(X \ Ho) and p(X \ 
Hi) over these regions Zq and Z\\ 



P F = / p(X\ H )dX 

P D = I p(X | Hi)dX 
Jz x 
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Jl \/2lt<T 

X2 1 



Jl V27T(Tl 



In figure 4 for example, we have plotted the ROC 
curve for the distributions f(X \ Ho) and f(X \ Hi) 
alongside. The axes are x = Pp, y = Pd- The line 
x = y is a lower bound, since a points on this line indi- 
cate that any decision is as likely to be true as false, so 
the observed value of A gives us no information. Though 
an ROC curve is a 3D entity (i.e., a point in (Pf, Pd, f]) 
space), we display its projection onto the r\ = plane 
and can easily find the associated r\ value for any (Pf, 
Pd) pair. When the threshold is high there is a prob- 
ability of false negative, but a probability of correct 
identification as well. As the threshold goes down, the 
probabilities of both occurences go up until the thresh- 
old is so low that both positive and false identification 
are certain. In our problem we assume that we do not 
have priors, so our goal is to pick a threshold such that 
we have a very high probability of identification and a 
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Figure 4: On the left is displayed the conditional probability density functions of a random variable X. On the right 
is the associated ROC curve, where _P^ and /^correspond to the x and y axes, respectively. 



low probability of false positives, i.e., we are interested 
in picking a point as close to the upper left hand side 
as possible. Note that the larger the separation between 
the two hypothesis distributions, the more the curve is 
pushed towards that direction. 

9.2 Applying ROC to Geometric Hashing 

In our problem formulation, Hq = probability that the 
model is not in the image, and H\ = probability that it 
is. In our case, we have a different ROC curve associated 
with every fixed (m, n) pair, where m and n are the 
number of model and image features, respectively. 

The next examples show the predicted comparison 
of the Gaussian model to the bounded uniform model. 
Figure 5 shows the ROC curves for the Gaussian and uni- 
form models, m - 3 = 10, n -3=10, 50, 100, 500, 1000, 
occlusion=0.0 and 0.25. We can see that in the case 
of no occlusion, for small values of n, both models pre- 
dict good Ppvs incurves, though the bounded uniform 
model will always be better because there is no possi- 
bility of a false negative for occlusion=0, while in the 
unbounded Gaussian case there always is. However, as 
n increases, the uniform model breaks down more rapidly 
than the Gaussian model for both occlusion values. For 
occlusion=0.25, both models perform about equally for 
small values of n (for example, at n = 100), but again as 
n increases, the uniform error model fails more dramat- 
ically than the Gaussian model (n > 500). 

Using this technique, we can predict thresholds for 
actual experiments, as shown in the next section. 

10 Experiment 

The predictions of the previous section were tested in the 
following experiment: to test an ROC curve for model 
size m, image size n, we run two sets of trials, one to test 
the probability of detection and one to test the proba- 
bility of false alarm. For the former, a random model 
of size m consisting of point features was generated and 
projected into an image, with Gaussian noise (a = 2.5) 
added to both the x and y positional components of each 
point feature. Occlusion (c) is simulated by adding a c 
probability of not appearing in the resulting image for 
each point. Only correct correspondences are tested, and 



the weight of each of these correct hypotheses is found 
using the algorithm: 

(a) for a correct hypothesis (mo : io', rni '■ H', rri2 '■ h) 
for every other model point rrij 
(i) find coordinates rrij = (aj,/3j) with respect 
to basis (mo , mi , 1TI2) , and from this, <r e = 

(ii) For every image point ij , find the mini- 
mum distance d between ij and any of the 
projected points such that d < 2a e . Add 

°'i to the supporting weight for 



10 



— 2iff e 2 " 

this hypothesis. 

(b) If the weight of the vote for this hypothesis is greater 
than some threshold 6, stop and output this as a 
correct instance of the model. 

For our experiment, we loop through thresholds from 
to Eh(v), and for every threshold we run the above 
algorithm enough times to get 100 sample points. To 
test the probability of false alarm, we run the same ex- 
periment exactly, except we use random images which 
do not contain the model we are looking for. We loop 
through the same thresholds as in the previous case to 
get a set of (Pp ,Pd) pairs for each threshold. The result- 
ing Pp, Pd, and ROC curves are shown in figure 6 for 
n - 3 = 10, 100, 500, 500, occlusion c = 0.0, 0.0, 0.0, 0.25. 
The ROC curves for the same parameters are shown 
alongside. 

In the cases of no occlusion, the predicted and em- 
pirical curves match very nicely. However, for occlu- 
sion=0.25, the empirical ROC curve falls below our ex- 
pectations. This is due to the fact that the distribution 
of Wh has a larger variance than our predicted value 
(see table 3 and figure 7). In fact, though we assumed 
at the outset of the analysis that the individual random 
variables Vh were independent, this is not the case; for 
a correct basis matching, the joint distribution of any 
two error vectors ei, tj , i, j ^ 0, 1, 2 can shown to have a 
non-zero covariance: 

A «J = (1 ~ a i - A0(1 ~ a i - Pj) + a i a i + PiPj 

This leads to a larger variance for the overall distri- 
bution than that predicted using the independence as- 
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Figure 6: Comparison of predicted to empirical curves for probability of false alarm, probability of detection, ROC 
curves. From top to bottom, n — 3 = 10, 100, 500, 500, occlusion = 0, 0, 0, 025. 



sumption and hence poorer results. We are currently 
working on another analysis that takes this dependence 
into account. 

11 Conclusion 

The geometric hashing method was introduced by Lam- 
dan, Schwartz and Wolfson in 1987. The first error 
analysis of the geometric hashing technique was done 
by Crimson, Huttenlocher and Jacobs, who showed that 
with even very small amounts of noise and spurious fea- 
tures, the technique had a very high probability of false 
positives. However, they assumed that the error was uni- 
form and bounded, which is a worst-case scenario and 
places an upper bound on the error rate. As we have 
shown here, with a Gaussian error assumption we can 
do much better. 

Costa, Haralick, and Shapiro demonstrated another 
error analysis for geometric hashing [CHS90] also based 
on a 2D Gaussian noise distribution associated with each 
point. Their analysis differs from this one technically in 
many respects, but the main difference is that they as- 
sume that the model they are looking for is present in the 
image and they focus on finding the pose by deriving an 
optimal voting scheme. This is in contrast to the work 
presented here, in which given a voting scheme and no 
prior information about the presence or absence of the 
model, we explicitly derived the probability of false de- 
tection as a function of clutter, and characterized the 
confidence level of the hypotheses that the method of- 
fers as "correct". We did this by choosing a hypothesis 
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evaluation function, and deriving the probability density 
of the evaluation function on both correct and incorrect 
hypotheses to determine, when given some hypothesis, 
which distribution it was drawn from. We showed also 
that the Gaussian error model separates the two distri- 
butions more than the uniform bounded error model, 
leading to better ROC curves. 

The contribution of this work is to cast the geo- 
metric hashing technique in terms of standard estima- 
tion theory, which has several advantages. The ROC 
curve formulation explicitly demonstrates the perfor- 
mance achievable for a given signal to noise ratio as a 
function of acceptance threshold. Given a desired detec- 
tion rate, the user can determine from the ROC curve 
what acceptance threshold to use in order to minimize 
the probability of false detections. In this formulation 
it is also clear when adequate performance cannot be 
achieved, for if the desired minimum performance point 
(Pf, Pd) lles above the ROC curve for a particular clut- 
ter level, then this performance is not possible no matter 
what operating parameters are chosen. The ROC for- 
mulation is also a succinct method for comparing voting 
schemes, as we compared the voting schemes implied by 
the Gaussian versus uniform error models. We expect to 
be able to use such techniques to choose thresholds ana- 
lytically instead of heuristically in recognition systems. 
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Figure 5: Comparisons of uniform and Gaussian error 
models for m - 3 = 10, n - 3 = 10, 50, 100, 500, 1000. 
From top: uniform, occlusion=0; Gaussian, occlusion=0; 
uniform, occlusion=.25, Gaussian, occlusion=.25. 
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